Assignment 1: Network visualization of terrorist connections

1.1

#Duc version
nodes <- read.table("trainMeta.dat")
links <- read.table("trainData.dat")
colnames(links) <- c("from","to","value")
colnames(nodes) <- c("label","group")

#Create nodes id, to macth with link
nodes$id <- as.numeric(rownames(nodes))
nodes <- nodes[,3:1]
#Create value
g <- graph_from_data_frame(links, directed = FALSE, vertices = nodes)
value <- as.numeric(strength(g))
nodes$value <- value
rm(value)

net1.1 <-visNetwork(nodes, links, main = "Terrorist connections") %>%
  visPhysics(solver="repulsion") %>% 
  visLegend(main = "Bombing group") %>% 
  visOptions(highlightNearest = list(enabled =TRUE, degree = 1),
             nodesIdSelection = TRUE)
net1.1

Most of people who placed the explosives (blue) are connected with each other besides Anuar Asri Rifaat and Abddenabi Koujma (but such two persons are connected in the net as well). Compared with most of yellow nodes, the blues nodes are bigger which means that these people connect very frequently. Jamal Zougam and Mohamed Chaoui are two biggest nodes in the blue cluster. They seems to be hubs in blue cluster and they also connect with Imad Eddin Barakat and several yellow nodes very often. Thus, although only 12 persons placed the bombs, many people in the yellow cluster are also very dangerous.

Additionally, there are 6 persons (outliers) which do not have any connection with the others.

1.2

net1.1 %>%visOptions(highlightNearest = list(enabled =TRUE, degree = 2), nodesIdSelection = TRUE)

Jamal Zougam is the person with the largest information net. He was one of six men implicated in the 2004 Madrid train bombings.

1.3

#1.3
ceb <- cluster_edge_betweenness(g) 
# change "groups" to "clusters"
nodes$group=ceb$membership
visNetwork(nodes,links, main = "Terrorist connections")%>%
  # visIgraphLayout() %>%
  visPhysics(solver = "repulsion") %>%
  visOptions(highlightNearest = list(enabled =TRUE, degree = 1),
             nodesIdSelection = TRUE)

No, the clusters in step1 are not discovered here. In Figure 1.3, the original blue cluster is divided into groups blue, yellow, orange, etc. Only the biggest node such as Jamal Zougam and Mohamed Chaoui still belong to one group. Most of nodes with low number of connections are separated into other small groups.

1.4

#1.4
netm <- get.adjacency(g, attr = "value", sparse=FALSE)
colnames(netm) <- V(g)$media
rownames(netm) <- V(g)$media
rowdist<-dist(netm)
order1<-seriate(rowdist, "HC")
ord1<-get_order(order1)
reordmatr<-netm[ord1,ord1]
name <- nodes$label[as.numeric(ord1)]
plot_ly(x=name, y=name, z=reordmatr, type="heatmap")

The most pronounced clusters is the cluster in the top right corner of the heat map. The names in this cluster include: Jamal Zougam, Amer Azizi, Abu Musad Alsakaoui, Mohamed Chaoui, etc. All of these names have frequent and strong connections to all people (show in Figure 1.3), but not all of them participate in the Bombing group (show in Figure 1.1).

Asignment 2: Animations of time series data

2.1 Amimated bubble chart

#2.1
pal <- c("red", "blue", "darkgreen", "black", "yellow", "purple", "pink", "orange")
p2.1 <- Oilcoal %>%
  plot_ly(
    x = ~Coal, 
    y = ~Oil, 
    size = ~Marker.size, 
    color = ~Country, 
    colors = pal,
    frame = ~Year, 
    text = ~Country, 
    hoverinfo = "text",
    type = 'scatter',
    mode = 'markers'
  ) 

p2.1

As we can see in the bubble chart, America and China consumed more and more Oil and Coal than others countries. It is reasonable since America is the most developed country which surely cost enormous fuels. Around 1980s, China developed so fast (by Chinese economic reform) and thereby cost a lot of coal, since China is also a coal self-product country. After that, to fulfill the requirement of development, the cost of both oil and coal increased dramatically in China, which contributes to the development of China at present. Besides, the cost of India is increasing obviously during these 4 decades. In general, the cost of other countries seems not change so much.

2.2 Two similar countries

Base on my observation, China and India seemly have the same motion pattern. To prove our conclusion, we draw a tracking path plot of all countries as below. It is clearly that China and India show the same pattern (although China has a dramatic range)

#2.2
plot_ly(Oilcoal, x=~Coal, y=~Oil,type = 'scatter', mode= "lines", split =~Country, color = ~Country, colors = pal)%>%layout(title = "Tracking path plot from 1965 to 2009")
Oilcoal_Fillted <- Oilcoal[Oilcoal$Country %in% c("China","India"),]

p2.2 <- plot_ly(Oilcoal_Fillted,
    x = ~Coal, 
    y = ~Oil,
    color =~Country, 
    colors = c("red", "blue"),
    frame =~Year,
    size = ~Marker.size,
    text = ~Country, 
    hoverinfo = "text",
    type = 'scatter',
    mode = 'markers'
  ) 
p2.2

As we can see in the motion bubble chat, both of them increased consumptions of Oil and Coal except the period 1996-1999. Additionally, there is a slight decrease for Coal comsumption.

According to the history, both China and India are the old civilizations. Both of these two countries have large number of population and worse background of modernization and industrialization than other countries, especially European countries and Japan. However, they developed fast based on their huge inner market and tried to chase other countries.

2.3 Animated bar chart

We calculate the p value base on this formula: P = Oil/(Oil + Coal)*100 and add rows as the requirements Here is the new data frame:

##       Country Year    Coal    Oil Marker.size  X     Oil_p
## 46     Brazil 1965   1.735 14.878         0.5 NA 89.556372
## 46.1   Brazil 1965   1.735 14.878         0.5 NA  0.000000
## 226     China 1965 165.633 10.960         1.0 NA  6.206362
## 226.1   China 1965 165.633 10.960         1.0 NA  0.000000
## 91     France 1965  45.055 53.887         0.5 NA 54.463221
## 91.1   France 1965  45.055 53.887         0.5 NA  0.000000

And then, we create an animated line plot of Oil-p versus Country, which make it become an animated bar plot.

p2.3b <- Oilcoal.expanded %>%
  plot_ly(
    x = ~Country, 
    y = ~Oil_p, 
    size = 2, 
    color = ~Country, 
    colors = pal,
    frame = ~Year, 
    text = ~Country, 
    hoverinfo = "text",
    type = 'scatter',
    mode = "lines",
    showlegend = F
  )
p2.3b

Compared with animated bubble chart, the animated bar plot remove the difference between high range of value among countries. By focus on the ratio (p-value), It can reflect the comparison among countries (y-values) clearly, and the consumption change in different years for all the countries. On the other hand, bubble chart can show more information about the real value of two features, and thereby we can easily find the change and behavior of different countries.

After 45 years, most of countries’ proportion of fuel consumption related to oil are higher than 60%, except China and India. Brazil is the country which has the highest ratio by oil during these decades, but France nearly seizes Brazil in 2009. Besides France, Germany and UK also increases their proportion but grow slightly from 1980 to 1990. Japan experiences an increasement, but fall back in 2009 around the situation in 1965. For US, the proportion seems not change a lot.

2.4 Elastic transition

#2.4
p2.4 <- p2.3b%>%animation_opts(
 easing = "elastic", redraw = F)
p2.4

In this task, we add the elastic transition to the previous plot. The bar chart looks exactly the same, but have different moving animation. The bar charts in task 2.3 moves more smoothly compared with the one with easeInOutExpo line in {r} [easing.net] (https://easings.net/).

Compared with the original animated bar chart, we can easily notice the big change by different countries during years in the new bar chart. The disadvantage is that you might miss some information based on such animation. For example, the p-values of Brazil are 87.0 in 1993 and 87.5 in 1994. It is possible for people ignore such change when other bars changes roughly.

2.5 Guided tour

tour

In this task, we create a guided 2D-tour visualizing coal consumption with coal and p-value as axis, and year as the moving vector.

Base on this plot, the projection step 10 clearly show the outlier year. Year 2009 is far from all others

Additionally, cluster correspond to different Year range. For example, the projection 0 have a high year range and maybe it show a two clusters (from 1976 to 1983 and from 1985 to 2003). In contrast, projection step 10 which has close year range show only one cluster.

China has the highest contribution. It is understandable because it consume a higher value of Coal than others. To be more specific, let’s take a look at the time series plot below.

plot_ly(Oilcoal, x =~Year, y=~Oil_p,split =~Country, color = ~Country, colors = pal, type = "scatter", mode= "lines")

Apdendix

Please put the code button to see the code of this report.

library(tourr)
library(plotly)
library(ggraph)
library(igraph)
library(visNetwork)
library(seriation)
library(reshape2)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE, include=TRUE)
#Duc version
nodes <- read.table("trainMeta.dat")
links <- read.table("trainData.dat")
colnames(links) <- c("from","to","value")
colnames(nodes) <- c("label","group")

#Create nodes id, to macth with link
nodes$id <- as.numeric(rownames(nodes))
nodes <- nodes[,3:1]
#Create value
g <- graph_from_data_frame(links, directed = FALSE, vertices = nodes)
value <- as.numeric(strength(g))
nodes$value <- value
rm(value)

net1.1 <-visNetwork(nodes, links, main = "Terrorist connections") %>%
  visPhysics(solver="repulsion") %>% 
  visLegend(main = "Bombing group") %>% 
  visOptions(highlightNearest = list(enabled =TRUE, degree = 1),
             nodesIdSelection = TRUE)
net1.1
net1.1 %>%visOptions(highlightNearest = list(enabled =TRUE, degree = 2), nodesIdSelection = TRUE)

#1.3
ceb <- cluster_edge_betweenness(g) 
# change "groups" to "clusters"
nodes$group=ceb$membership
visNetwork(nodes,links, main = "Terrorist connections")%>%
  # visIgraphLayout() %>%
  visPhysics(solver = "repulsion") %>%
  visOptions(highlightNearest = list(enabled =TRUE, degree = 1),
             nodesIdSelection = TRUE)
#1.4
netm <- get.adjacency(g, attr = "value", sparse=FALSE)
colnames(netm) <- V(g)$media
rownames(netm) <- V(g)$media
rowdist<-dist(netm)
order1<-seriate(rowdist, "HC")
ord1<-get_order(order1)
reordmatr<-netm[ord1,ord1]
name <- nodes$label[as.numeric(ord1)]
plot_ly(x=name, y=name, z=reordmatr, type="heatmap")
#2
input_path <- "Oilcoal.csv"
Oilcoal <- read.csv2(file = input_path, sep = ";")
#2.1
pal <- c("red", "blue", "darkgreen", "black", "yellow", "purple", "pink", "orange")
p2.1 <- Oilcoal %>%
  plot_ly(
    x = ~Coal, 
    y = ~Oil, 
    size = ~Marker.size, 
    color = ~Country, 
    colors = pal,
    frame = ~Year, 
    text = ~Country, 
    hoverinfo = "text",
    type = 'scatter',
    mode = 'markers'
  ) 

p2.1
#2.2
plot_ly(Oilcoal, x=~Coal, y=~Oil,type = 'scatter', mode= "lines", split =~Country, color = ~Country, colors = pal)%>%layout(title = "Tracking path plot from 1965 to 2009")

Oilcoal_Fillted <- Oilcoal[Oilcoal$Country %in% c("China","India"),]

p2.2 <- plot_ly(Oilcoal_Fillted,
    x = ~Coal, 
    y = ~Oil,
    color =~Country, 
    colors = c("red", "blue"),
    frame =~Year,
    size = ~Marker.size,
    text = ~Country, 
    hoverinfo = "text",
    type = 'scatter',
    mode = 'markers'
  ) 
p2.2


#2.3
Oilcoal$Oil_p <- Oilcoal$Oil/(Oilcoal$Oil + Oilcoal$Coal)*100
Oilcoal.expanded <- Oilcoal[rep(row.names(Oilcoal),2), ]
for (i in 361:720) {
  Oilcoal.expanded$Oil_p[i] <- 0    
}
Oilcoal.expanded <- Oilcoal.expanded[order(Oilcoal.expanded$Year, Oilcoal.expanded$Country),]

head(Oilcoal.expanded, 6)
p2.3b <- Oilcoal.expanded %>%
  plot_ly(
    x = ~Country, 
    y = ~Oil_p, 
    size = 2, 
    color = ~Country, 
    colors = pal,
    frame = ~Year, 
    text = ~Country, 
    hoverinfo = "text",
    type = 'scatter',
    mode = "lines",
    showlegend = F
  )
p2.3b

#2.4
p2.4 <- p2.3b%>%animation_opts(
 easing = "elastic", redraw = F)
p2.4
#2.5
# The code of this part is following the example code of prof. Sysoev, which is published on the course website.
Oilcoal2 <-  dcast(Oilcoal, Year~Country, value.var = "Coal")
Oilcoal2 <- as.matrix(Oilcoal2)
rownames(Oilcoal2) <- Oilcoal2[,"Year"]
mat <- scale(Oilcoal2[,-1])

set.seed(3346)
tour<- new_tour(mat, guided_tour(cmass), NULL)
steps <- c(0, rep(1/15, 180))
Projs<-lapply(steps, function(step_size){  
  step <- tour(step_size)
  if(is.null(step)) {
    .GlobalEnv$tour<- new_tour(mat, guided_tour(cmass), NULL)
    step <- tour(step_size)
  }
  step
})

# projection of each observation
tour_dat <- function(i) {
  step <- Projs[[i]]
  proj <- center(mat %*% step$proj)
  data.frame(x = proj[,1], y = proj[,2], state = rownames(mat))
}

# projection of each variable's axis
proj_dat <- function(i) {
  step <- Projs[[i]]
  data.frame(
    x = step$proj[,1], y = step$proj[,2], variable = colnames(mat)
  )
}

stepz <- cumsum(steps)

# tidy version of tour data

tour_dats <- lapply(1:length(steps), tour_dat)
tour_datz <- Map(function(x, y) cbind(x, step = y), tour_dats, stepz)
tour_dat <- dplyr::bind_rows(tour_datz)

# tidy version of tour projection data
proj_dats <- lapply(1:length(steps), proj_dat)
proj_datz <- Map(function(x, y) cbind(x, step = y), proj_dats, stepz)
proj_dat <- dplyr::bind_rows(proj_datz)

ax <- list(
  title = "", showticklabels = FALSE,
  zeroline = FALSE, showgrid = FALSE,
  range = c(-1.1, 1.1)
)

# for nicely formatted slider labels
options(digits = 3)
tour_dat <- highlight_key(tour_dat, ~state, group = "A")
tour <- proj_dat %>%
  plot_ly(x = ~x, y = ~y, frame = ~step, color = I("black")) %>%
  add_segments(xend = 0, yend = 0, color = I("gray80")) %>%
  add_text(text = ~variable) %>%
  add_markers(data = tour_dat, text = ~state, ids = ~state, hoverinfo = "text") %>%
  layout(xaxis = ax, yaxis = ax, showlegend = F)


tour
plot_ly(Oilcoal, x =~Year, y=~Oil_p,split =~Country, color = ~Country, colors = pal, type = "scatter", mode= "lines")